GrossDomesticProduct(GDP)isaprimaryindicatorofanation’seconomicperformance.AccurateforecastingofGDP plays a crucial role in policy-making, global trade, and investment planning. This research focuses on predicting world GDP using three machine learning algorithms—Linear Regression, Random Forest, and XGBoost—to identify which provides the most reliable results. The dataset includes historical economic data sourced from global institutions such as the World Bank and IMF. Models are evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and RZ Score. Experimental findings show that XGBoost achieved the highest accuracy with an RZ of 0.90, outperformingothermodels.AStreamlit-basedwebapplicationwasdevelopedforinteractivevisualizationandreal-time prediction. This paper demonstrates how data-driven learning models can significantly enhance global economic forecasting.
Introduction
Gross Domestic Product (GDP) measures the total value of goods and services produced within a country and is a key indicator of economic health. Predicting world GDP is essential for governments, policymakers, financial institutions, and global organizations like the IMF and World Bank. Accurate GDP forecasting supports economic planning, recession preparedness, and international investment decisions.
With increasing market complexity and large volumes of economic data, traditional statistical methods (like ARIMA and basic regression) often fall short. Machine Learning (ML) and Artificial Intelligence (AI) techniques—such as Linear Regression, Random Forest, and XGBoost—offer improved accuracy by identifying complex patterns within multidimensional datasets.
This study compares ML models for predicting world GDP using historical data and key economic indicators to identify the most accurate and reliable forecasting method.
Literature Review Summary
Traditional Approaches:
Classical methods like ARIMA, VAR, and Kalman filters are interpretable and useful for short-term forecasting but struggle with large datasets, nonlinear relationships, and economic shocks (e.g., 2008 crisis, COVID-19).
Rise of ML & Deep Learning:
Since 2015, algorithms such as Random Forests, XGBoost, SVR, and neural networks (MLP, LSTM, RNN) have shown better performance for complex and nonlinear data. ML advantages depend on dataset size, horizon, and available predictors.
Nowcasting & Hybrid Models:
Modern studies use high-frequency data—Google Trends, financial indices, mobility data, and satellite imagery—to estimate GDP before official data release. Hybrid models combining ML with econometric methods often outperform single models.
Model Comparison Practices:
Common evaluation metrics include RMSE, MAE, MAPE, and R², along with walk-forward validation. Studies show moderate but consistent accuracy improvements when ML is compared to IMF or central bank forecasts.
Major Challenges:
Structural breaks and crises require frequent retraining
Limited interpretability of complex models
Cross-country differences reduce generalization
Risk of overfitting and data bias
Methodology Summary
Data Collection:
GDP and economic indicators (population, inflation, trade balance, employment) were taken from World Bank and IMF datasets (1990–2024).
Data Preprocessing:
Missing values → mean interpolation
Outliers → IQR method
Categorical features → label encoding
Normalization → Min-Max scaling
Train-test split → 80/20
Models Implemented:
Linear Regression – baseline model
Random Forest Regressor – ensemble of decision trees
XGBoost Regressor – boosted trees with regularization (handles non-linearity best)
Evaluation Metrics:
RMSE – penalizes large errors
MAE – average prediction error
R² score – proportion of variance explained by the model
Results and Discussion Summary
Model
RMSE
MAE
R² Score
Linear Regression
1250.5
980.2
0.85
Random Forest
1100.3
850.7
0.88
XGBoost
1050.8
820.4
0.90
XGBoost performed the best, achieving the lowest error values and highest R² score, demonstrating strong ability to model complex, nonlinear economic relationships.
Graphical visualizations (scatter plots, line charts, bar charts) further show that XGBoost most accurately matches actual GDP trends and outperforms Linear Regression and Random Forest.